-
Notifications
You must be signed in to change notification settings - Fork 423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TEZ-4521: Partition stats should be always uncompressed size #317
Conversation
🎊 +1 overall
This message was automatically generated. |
// Use partition sizes to compute the total size. | ||
if (partitionSizes != null) { | ||
totalSize = estimatedUncompressedSum(partitionSizes); | ||
totalSize = Arrays.stream(partitionSizes).sum(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does totalSize change with this patch? if it doesn't, why? if it does, can we validate it with this unit test or in anyway that makes sense to you @okumin ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This patch doesn't change the total size. That's because the total size is stored in a different field of Protbuf from partitions stats. The design is valid since users have an option not to take partition stats at all(tez.runtime.report.partition.stats=none
).
TEZ-4521 would remove the possibility where partition stats contain the compressed size. That's why I revised this file to prevent future users from being confused.
thanks @okumin for this patch, I've put a minor comment |
@abstractdog Thanks! This patch is related to #306 and I'd be glad if you could take a look at it. |
merged, thanks @okumin |
Currently, we need to configure the compressed size for ordered outputs and the decompressed size for unordered output. It makes sense if we can have consistent semantics.
https://issues.apache.org/jira/browse/TEZ-4521